"OpenStreetMap lights up the world for blind users with sound" by Erik Schlegel Live captioning by Norma Miller. @whitecoatcapxg >> My I'm is Erik, and I'm here to talk about with the emergence of mobile devices and GPS sensors going into the devices, the need to have more elastic and available and open geospatial data is more imperative now than ever. So a little about me. I'm a senior engineer at Microsoft who works out of the New York City location. I get to work with Microsoft's most strategic partners and toughest challenges and problems and all the solutions that our team engineers are general purpose and open source, other Microsoft partners and other developers and other partners that we have can take those solutions and apply it to their own scenario.S. We're huge open source enthusiasts among our team. About 95, 99% of the products we work on is eventually open source. We're big proponents of open data solutions. My current role in this project was the backend architect and engineer for the cities unlocked project that I'll be talking about shortly and prior to this project I was one of the early contributors to React native for windows. So about two years ago, we partnered with the guide dogs association to take on the ambitious aspiration of building a navigation system for blind people. With city unlocked, we're rev we started to think how we can empower people to be more mobile and act in ways similar to what sighted people would when they went out on journeys, we wanted blind people to be more confident and minimize the fear to be able to go out and go on through a journey. So it's been quite a journey. And what did we build? We built a mobile navigation app that user interface is fully audio based. So what this is is it's an iOS app that runs in background mode and it sits in the user's pocket or bag and it's paired with a custom-made headset that hour team built and on the back of this headset we have head-tracking sensors so that it tracks where a person is looking, where they currently are and how fast they're moving and this was GPS sensor there's aby rometer, an accelerometer and a compass. So the headset is nearer audio, so the ear piece that's near the ear, it doesn't block the ear, because you can't block the ear foreplanned people. They need to be able to hear ambient noise, and the head piece is paired with a custom remote. So there's a mic on the head piece, and there's a button that basically triggers on this remote here, that says to listen, to actually listen to voice commands and as a person changes his location or approaches new objects, new places or landmarks, the application will automatically call out that place, that landmark and it will call it out spatially relative to where the person is directly looking so for example if Central Park is on my right and I was sitting right here, it will call to me saying Central Park is 10 meters away but it would be from the right side. As I turned facing Central Park, immediately thereafter, it would then call out that same phrase, depending on hw far I was, as if it was standing right in front of me. So just to see how everything works I'm going to demonstrate the app. So here I just set my location to right in front of Central Park. Heading north on Central Park south. >> -- five meters south from the Central Park south intersection. >> So the voice is all configurable. The dialect is configurable. User has the option of configuring whatever dialect they want. The command I just invoked was where am I. So it tells me the aI am. The street I am at. How far away I'm at from the nearest intersection. >> Voice: No description available. Central Park. >> So now I have -- >> Middle upper Manhattan, within New York City. Central Park is the most visited urban park in the United States, with 40 million visitors in 2013. Crossing within 30 meters, USS main monument around 75 meters, trump international -- >> Hush. [laughter] >> It's a lot of information it gives out. But so with any object that's called out, the user has the ability to ask for more information, and what we do is for that particular object if there's a linkage between that object and OpenStreetMap and Wikipedia, we pull in the first three sentences from the Wikipedia subject from that object so people have context for what's actually in front of them. >> >> Warning: GPS navigation is, information is restricted. Use information with caution. >> So if the user has poor GPS accuracy we will call that out. >> So now I'm going to give a turn by turn navigation. >> Take me to a cafe. >> And you're off. Don't forget that this is a work in progress. You are responsible for your safety. Always use your best judgment and mobility skills. Walking to coffee cake corner on 6th avenue. ETA is about 35 minutes in about 30 meters, turn -- >> So. So the click clack sound that you just heard was basically when we go to get a turn by turn navigation instruction, when a person asks to take me to a particular place, we actually create a track from that current -- that person's current position to the end point, and if that person veers off track, then the -- that click-clack sound will get stronger to basically guide the person back on track. So it's almost like, think of it like following the light of a lighthouse, the beacon of a lighthouse in trying to get to that final point. The app that we built here is very similar to -- you could think of an analogy of if you're in a museum and you have a head piece on to basically tell you the information about the painting as you approach that painting. We are trying to paint the world for the user, trying to basically paint it with sound. Give OpenStreetMap objects and nodes a voice. So when a user first gets the box, the custom hardware and the remote, this is what they see. This is not the actual headset. The headset is still in the Microsoft garage. We are still work everything on some dev changes, but as soon as you open up the box it will actually call out what the instructions are how to set up the twice. With the remote control, the remote control has demands that are binded to it so there's commands here that you could actually invoke a more information command or where am I command. With the D pad control here, you could actually click through and it will actually tell you what's in front of you by clicking up, what's on the left, what's on the right and what's directly behind you. You can actually repeat a command. There's also a hush button as you can imagine will come in handy and as I said earlier, the listen button, as well. >> So most of you already know about OpenStreetMap, but just to kind of just go through the background of OSM came out in 2004 at a time when mapping data was controlled by large institutions. Data was very expensive. Steve Coast came up and created the first open sourced map of the UK, and Steve actually used to work at Microsoft. He actually used to work at Bing for three years, and to create the initial dataset of OpenStreetMap there was created off of deciphering Bing early images, as well as Census Bureau data off of tiger dataset. OpenStreetMap works very similar to Wikipedia, the GPS coordinates are collected by the users and the data is then uploaded to Wikipedia. I'm sorry to OpenStreetMap. It's free out of the bolt and it's open sourced with the open database license and no approval, no data approval body. So just to briefly talk about the services that are powering this: So we're using making heavy use of the Cortana voice recognition services for text to speech, speech to text, as well as the conversation, as a service. So being able to create data -- machine models that a person would say that take me to a coffee house, that we would decipher that to take me to a cafe. Cortana is one at the forefront in terms of having accurate voice recognition services. The Automattic callouts for being able to actually have the nearby points of interest and places we make heavy uses of Mapzen's vector service for that. The routing, turn by turn navigation makes heavy use of Mapzen's turn by turn routing services as they have an open source routing engine for that. Forward and reverse geo coding, local search so to be able to support arbitrary searches, so to be able to say take me to a French restaurant or take me to Starbucks which will basically default to the nearest Starbucks or you could say take me to Starbucks near Broadway and it will look near Broadway. we make as I mentioned make heavy use of Mapzen's third party services, Cortana, OpenStreetMap, Wikipedia, and also open cage geo coder. One of the big most appealing parts of OpenStreetMap for us was the accessibility features, the ability to actually track the wake aways and paths and alleys and be able to say how many steps are in a stairwell. How lit a sidewalk is. To basically guide use yes, sir to have you know, more of a safe route. So just to talk about what's understand the hood. So we built -- so the way that we implemented this was a tile-based navigation platform. So we basically pinpoint what tile user currently is in. So anyone that's not familiar, so with the tile-based, the world is divided up in tiles, the top-level tile is one single tile. As you go one level deep, that one parent tile is divided up into four tiles so it's based on a row column and a zoom level. You know, the deepest level is 19 with 217 trillion tiles total in the world. So we figure out what tile is person is at, figure out all the nearby tiles and then we use Mapzen's vector tile service to get the structured data after of that. So our services then take that data, and we index it and we index it in an elastic search. So when we look at the nearby tiles and we see that any of our nearby tiles are not in the index we'll basically make a call out to Mapzen, get the data and cache it. And we have a four-node elastic cluster to ensure that we have an authentic search. We as I mentioned we integrate with Wikipedia. And the idea is that we want to take this whole entire platform and we're going to open source it and make it available on GitHub and offer it as a platform as a service so anyone can take it and being able to actually just get started with querying OpenStreetMap data in an efficient way. So with the fact that as a user changes their heading direction, the data that comes back has to be very fast it has to be, you know, very quick. Our users, our responsers basically told it has to be one tenth of a second, so we use the elastic search engine which is based off of the Apache project and it's a search server and it's a technology that's written in java and it's open source and what it's able to do is partition the data in shards so you're able to have like five different shards. A shard is a partition, so you divide up computing, paralyzed computing across many shards and they you could grow this out horizontally and scale the platform horizontally as the data grows, it's a matter of putping a new node on the cluster node and discovering the network of being aware of what that node is. The great thing about elastic search it's fault tolerant, it's load balance, we rely on it heavily for our local search querying, so being able to do fuzzy matching searches in case of misspellings. It's going to be fully poweredly Azure. So the turn by turn navigation, it's all pedestrian-based routes so the way that Mapzen's routing comes back, is that will first default to a pedestrian walkway. If it's there, according to your requested route. If it's not there, it will then go to the road route. So and it's able to avoid stops and alleyways and it's tailored for leveraging accessibility data that you map in in OpenStreetMap. So just to talk about the spatial audio, so one of the most important things for vision impaired people is that they use sound to position orientate themselves. Microsoft research created a sound engine that simulates sound in a 3D globe through space by controlling the frequency of the sound to create the perception of distance variations. So the way it works is that it basically with the objects you want to render, you basically provide the actual space, the distance and the direction that that object is relative to you, and they basically HRTF is the name of the engine and it's basically it's controlling the frequency of how far, what the frequency is between the sound traveling to your left ear and your right ear. I don't have time to actually demo this but how are we monitoring this? How do we ensure that these services are all operating well? So we're using application insights to track our telemetry, to get insight in our telemetry. So tracking things such as the GPS a accuracy on the device, the battery levels, how long did a route take? Did a person veer off course in the route? How many people actually completed the route. So application insight is a great way to gain transparency into what the state of your app is, who's using it, what cities are they using it in and also able to set up web tasks to be alerted in the sense that a certain dependency is exceeding a threshold so you're able to benchmark your services. So we do call out a lot in order to manage that. All the amenities,we basically map amenities in OpenStreetMap to super categories and users are able to configure which super categories they want to automatically hear and not hear. So a super category being location sense so I change my location, the app telling you, mobility, mobility-specific amenities. So why OpenStreetMap as opposed 0 to other map providers? The update schedule is great. The data is able to get propagated down to the -- make data available within 20 minutes of change in OpenStreetMap. The data is very -- it's very flexible in terms of the taxonomy and the data schema, and the accessibility story, being able to actually track accessibility-related information of our spatial data and having our services tailor and cater to that. Flexible license,ed thriving community, and being able to support offline navigation, the growing community. So big challenge: We are heavily reliant on pedestrian routes. Pedestrian paths, sidewalks. So you know, that -- the availability of that is very, very sparse in OpenStreetMap. GPS accuracy is a challenge. GPS is not perfect. So if your GPS is inaccurate, then the quality of the results are going to be inaccurate with it. OK? I know I skimmed through that, sorry. A lot to pack in for a short talk, so I'm open for questions. >> AUDIENCE MEMBER: Is there a phone currently out on the market that provides a high enough level of GPS accuracy that it would make that work in like a city with buildings? >> So the question is, is are there phones out there that do help alleviate some of the GPS accuracy issues? The only -- I mean the GPS devices, the tracking devices are the ones that have the highest level of accuracy. You'd have to be able to build the algorithms and be able to will smooth out the accuracy issues and see how much at one points it is basically able toll figure where you are in terms of as you move around, you could vary it and smooth it out. We are working on integrating those algorithms into the platform, but a particular device? No, I haven't seen anything out there in a regular phone. It's very limited. AUDIENCE MEMBER: [speaking off mic] [inaudible] >> So the question -- there's a couple of parts of that question. I guess the one question is, how do we test this and how do we validate the user experience? So we're in user trials right now. The design actually came -- the user experience came from blind people. It came from our trial users were blind users, we've been -- we actually did this whole thorough research study with blind users with guide dogs. They're in the midst of testing this right now. With the 3D audio, it's able to, with the sound, because blind people have such an innate detail perception to sound, with it coming and with the ability for it to emanate in different directions, it provides more of a clarity picture of where that object is. Without that, it would be very difficult to create a user experience that would work with the target audience, because they have to have some sense of direction where these objects are. >> AUDIENCE MEMBER: [inaudible] >> Yeah,so the question is, how would you use this for transportation modes outside of pedestrian walking, for buses. So right now we just support the pedestrian walkway scenario. Buses is challenging, because you'd have to tell the user when a bus is literally arriving, and we tried that before with beacons, and it was a bit of a disaster. It was a disaster, because in order to have blind users hear that things are happening as they happen, it has to be real time. There can't be any delay. So buses just were really hard. If you have something Transitland that does have minute by minute update that is something we would seriously love to engage in and explore. AUDIENCE MEMBER: [inaudible] We haven't. We've had our hands full with just doing the pedestrian walkway, but transit is next phase. It is on the roadmap, and this is not just limited to blind people. I mean the long-term goal is it to open this up for sighted people, as well. AUDIENCE MEMBER: [inaudible] >> Not really, so, well it is, but it's on the plan. One of the things is theres another project in Microsoft that is -- that does have a camera on a pair of sunglasses. And it is able to cipher those objects. The idea is to pair that and eventually merge that. Eventually, yeah, but not immediate plan. There is another question in the back, I believe. AUDIENCE MEMBER: [inaudible] >> So the question is, is that do we plan on having any interactive features to report invalid data ha that they've encountered. We do have a mechanism in the app that you can basically rate the quality of the data coming back. But we don't expose that to -- it's more for our user experience team. But not for our users, just because we just worry that it might be a little bit too challenging for blind users to use. But we do have a feature for regular sighted users. AUDIENCE MEMBER: [inaudible] I mean yes, so the question was, supporting partially, partially -- >> Low vision. >> Low vision, users with low vision. Yes, it's definitely, we have spoken about that. The features can be extended to support that. With the fact that we do have our cognitive services and our voice recognition capabilities, as well as spatial audio, so it's a matter-of-just tailoring the user experience to support that. >> Yes? >> So I know your demonstration just showed references to -- [inaudible] right, so the question was is that so compass direction and relevance to our target audience, blind users, how can they decipher that? So we say that just to establish context. Some users kind of do figure, they do know the -- you know, they can figure it out but that's where the spatial audio piece comes in, being able to have the audio come in the direction that it's at and the click-clack sound to guide them along. They rely on that quite heavily. The direction of where things are, we're just using that in terms of detailed context of where things are located relevant to where the user is located so for example if an intersection is right there or an intersection is right on the right or on the right or 10 meters east of your current location. >> Do we have time for more questions or -- any more questions? AUDIENCE MEMBER: [inaudible] so the question is you know, have we considered creating a website to improve the quality of the data in OpenStreetMap. So next week we have a hackfest with some of the guys in the team where we're using aerial imagery and street view imagery, and the goal will be to take those deciphered walking paths and contribute it back to OpenStreetMap. So we have more accurate routes. Okay? Well, thanks, everyone for your time: [applause]